Data Sources

The data sources for this project are 3-fold: Solar radiation and weather data, Solar Energy Generation and Consumption data, and solar stock data. All code for this tab can be found here.

Solar Radiation and Weather Data

Using the National Solar Radiation Database and the NSRDB Viewer within, a point in the San Joaquin Valley (the most fertile farming area in the U.S.) was selected which allowed for the extraction of solar radiation and weather variables for the location at daily frequency from 1998 - 2020. The exact data can be downloaded here. Data for any longitude and latitude coordinates can be extracted using this site. A look at the solar radiation data can be seen below:

California Energy Generation and Consumption

The U.S. Energy Information Administration stores information on all forms of energy generation and consumption. It offers an API that allows for the pulling of up to 5000 rows of data at a time. They also have a point and click interface that helps craft the API for you which can be found here. The code used to unpack the JSON is below:

# Define the API endpoint and parameters
endpoint <- 'https://api.eia.gov/v2/electricity/electric-power-operational-data/data/?frequency=monthly&data[0]=ash-content&data[1]=consumption-for-eg&data[2]=consumption-for-eg-btu&data[3]=consumption-uto&data[4]=consumption-uto-btu&data[5]=cost&data[6]=cost-per-btu&data[7]=generation&data[8]=heat-content&data[9]=receipts&data[10]=receipts-btu&data[11]=stocks&data[12]=sulfur-content&data[13]=total-consumption&data[14]=total-consumption-btu&facets[sectorid][]=98&facets[fueltypeid][]=AOR&facets[fueltypeid][]=NG&facets[fueltypeid][]=SUN&facets[fueltypeid][]=WND&facets[location][]=CA&start=2001-01&end=2022-10&sort[0][column]=period&sort[0][direction]=desc&offset=0&length=5000&api_key=F2M2Ra1oc6mkOx8oTyPnjmCHYN3R5fm12Bkey5we'
# Send the GET request
response <- GET(endpoint)

# Parse the JSON response
data <- content(response)

##Rowbind all the json items into a df
consumption_raw <- as.data.frame(do.call(rbind, data$response$data))

##Replace Null values with NA's in each column
consumption_lists <- 
  lapply(consumption_raw, function(x) {
    lapply(x, function(y) {
      ifelse(is.null(y), NA, y)
    })
  }
  )
## Convert columns from lists to vectors
consumption <- data.frame(lapply(consumption_lists, function(y){Reduce(c, y)}))

## Write to csv to avoid recalling the API
write.csv(consumption, 'data/consumption_CA_2001_2022.csv', row.names = FALSE)

Now we can take a quick look at the data. Notably, the solar energy consumption is larger then the utility scale power generation of solar energy. This is due to the widespread use of household solar panels to provide homes with energy.

Stock Data

Lastly, I will be looking at stock data using the quantmod library and yahoo finance. This data will be used to look at the performance of solar energy companies, particularly ones that focus on serving California residents. Using this data, it will be possible to track the financial performance of these companies over time and see how it relates to power generation and other variables like solar radiation and weather.